In this RMarkdown document I will be doing the analysis of the “Full Rabbit Dataset.csv” constituting the main focus of the project. This data set contains point occurrence counts of invasive European rabbits (Oryctolagus cuniculus) in Australia. These point counts were collated from various direct and indirect studies as well as citizen science observations during the time frame 1760 to 2015 and form part of the long term rabbit data set (Roy-Dufresne et al 2019). The data set also contained a suit of environmental variables as well as the presence-absence data from the “Species Pseudoabsence Generation.Rmd” document. Finally, various abundance estimates were derived from the data provided by the data base and from the “Estimating Rabbit Abundance.Rmd” document.
Variable names:
The 13 vegetation types were a re-classification of all the vegetation types in Australia as described by the Environment Department of the Australian Government classification scheme and the original classes can be found at “https://www.awe.gov.au/agriculture-land/land/native-vegetation/national-vegetation-information-system/data-products”. The re-classifications used in the data set are found below:
The factor levels in Disease are coded 0-3 with what each number corresponds to below:
As Season is coded according to the Australian calender the numeric coding of 1-4 represents different seasons than would be the case for a Northern hemisphere nation:
The state variable refers to the 8 states/territories of Australia coded with their 2/3-letter coding with the full names of each state/territory given below:
Finally, in all the animal presence/absence factors 1 represents presences and 0 represents absences.
The aim of the project is to determine the drivers of rabbit occurrence patterns, given the variables in the data set, at different spatial scales and compare the variables that are in each final model. The scales will be on the country scale, state/territory scale and transect scale. The transect scale will consist of random transects sampled from the data on a North-South axis and a East-West axis.
I will start by loading the R packages that I will be using for this analysis.
library(mgcv)
## Loading required package: nlme
## This is mgcv 1.8-36. For overview type 'help("mgcv-package")'.
library(ggplot2)
library(ggcleveland)
## Warning: package 'ggcleveland' was built under R version 4.1.2
library(patchwork)
library(tidyverse)
## Warning: package 'tidyverse' was built under R version 4.1.2
## -- Attaching packages --------------------------------------- tidyverse 1.3.1 --
## v tibble 3.1.6 v dplyr 1.0.7
## v tidyr 1.1.4 v stringr 1.4.0
## v readr 2.1.1 v forcats 0.5.1
## v purrr 0.3.4
## Warning: package 'tibble' was built under R version 4.1.2
## Warning: package 'readr' was built under R version 4.1.2
## -- Conflicts ------------------------------------------ tidyverse_conflicts() --
## x dplyr::collapse() masks nlme::collapse()
## x dplyr::filter() masks stats::filter()
## x dplyr::lag() masks stats::lag()
library(effects)
## Warning: package 'effects' was built under R version 4.1.2
## Loading required package: carData
## Warning: package 'carData' was built under R version 4.1.2
## lattice theme set by effectsTheme()
## See ?effectsTheme for details.
library(car)
## Warning: package 'car' was built under R version 4.1.2
##
## Attaching package: 'car'
## The following object is masked from 'package:dplyr':
##
## recode
## The following object is masked from 'package:purrr':
##
## some
There are also some custom functions that I want to create that will be helpful for graphical data analysis and plotting with ggplot2
#Augmented pairs plot
panel.hist = function(x, ...) {
usr = par("usr"); on.exit(par(usr))
par(usr = c(usr[1:2], 0, 2.5))
hist(x, freq = FALSE, col="cyan", add=TRUE)
lines(density(x))
}
panel.cor = function(x, y, digits = 2, prefix = "", cex.cor, ...){
usr = par("usr"); on.exit(par(usr))
par(usr = c(0, 1, 0, 1))
r = abs(cor(x, y))
txt = format(c(r, 0.123456789), digits = digits)[1]
txt = paste0(prefix, txt)
if(missing(cex.cor)) cex.cor <- 0.8/strwidth(txt)
text(0.5, 0.5, txt, cex = cex.cor * r)
}
pairs2 = function (x) {
pairs(x, lower.panel = panel.smooth, upper.panel = panel.cor, diag.panel = panel.hist)
}
#Co-plot panel function
coplot.ablines = function(x, y, ...){
tmp = lm(y ~ x, na.action = na.omit)
abline(tmp)
points(x, y)
}
#Custom ggplot theme
theme_customized = function(base_size = 13, base_family = "", base_line_size = base_size/22, base_rect_size = base_size/22){
theme(
axis.title = element_text(size = 13),
axis.text.x = element_text(size = 10),
axis.text.y = element_text(size = 10),
plot.caption = element_text(size = 10, face = "italic"),
panel.background = element_rect(fill = "white"),
axis.line = element_line(size = 1, colour = "black"),
strip.background = element_rect(fill = "#cddcdd"),
panel.border = element_rect(colour = "black", fill = NA, size = 0.5),
strip.text = element_text(colour = "black"),
legend.key = element_blank()
)
}
I would also like to note that the custom R functions above were provided to me during statistics courses run by Dr Alex Douglas and Dr Thomas Cornulier at the University of Aberdeen.
Now lets finally import the data
Rabbit = read.table("E:/Masters Project/BI5002 (Masters Project)/Invasive European Rabbit Data/Full Rabbit Dataset.csv",
header = TRUE, stringsAsFactors = TRUE, sep = ",")
str(Rabbit)
## 'data.frame': 689265 obs. of 37 variables:
## $ Occurrence_ID : int 683808 683809 684986 684987 684988 686921 686922 686923 686924 686925 ...
## $ Lat : num -37.1 -38.4 -36.8 -36.5 -36.6 ...
## $ Long : num 148 145 147 145 147 ...
## $ Occurences : int 33 24 5 2 7 159 7 57 57 111 ...
## $ Abund.1 : int 33 24 5 2 7 159 7 57 57 111 ...
## $ Abund.2 : num NA NA NA NA NA NA NA NA NA NA ...
## $ Abund.3 : num 5.14e-04 3.74e-04 7.78e-05 3.11e-05 1.09e-04 ...
## $ No.of.10km.cells : int 6425 6425 6425 6425 6425 6425 6425 6425 6425 6425 ...
## $ Year : int 1760 1760 1760 1760 1760 1760 1760 1760 1760 1760 ...
## $ Day : int NA NA NA NA NA NA NA NA NA NA ...
## $ A_Prec_Avg30Yr : num NA NA NA NA NA NA NA NA NA NA ...
## $ A_Psea_Avg30Yr : num NA NA NA NA NA NA NA NA NA NA ...
## $ A_TAvg_Avg30Yr : num NA NA NA NA NA NA NA NA NA NA ...
## $ A_TMax_Avg30Yr : num NA NA NA NA NA NA NA NA NA NA ...
## $ A_TMin_Avg30Yr : num NA NA NA NA NA NA NA NA NA NA ...
## $ A_TSea_Avg30Yr : num NA NA NA NA NA NA NA NA NA NA ...
## $ A_TWet_Avg30Yr : num NA NA NA NA NA NA NA NA NA NA ...
## $ A_TWrm_Avg30Yr : num NA NA NA NA NA NA NA NA NA NA ...
## $ A_Prec_AvgAutumn30Yr: num NA NA NA NA NA NA NA NA NA NA ...
## $ A_Prec_AvgSummer30Yr: num NA NA NA NA NA NA NA NA NA NA ...
## $ A_Prec_AvgSpring30Yr: num NA NA NA NA NA NA NA NA NA NA ...
## $ A_Prec_AvgWinter30Yr: num NA NA NA NA NA NA NA NA NA NA ...
## $ DistPermWater : num NA NA NA NA NA NA NA NA NA NA ...
## $ DistAgriLand : num NA NA NA NA NA NA NA NA NA NA ...
## $ PercSoilClay : num NA NA NA NA NA NA NA NA NA NA ...
## $ MinDayLength : num NA NA NA NA NA NA NA NA NA NA ...
## $ VarDayLength : num NA NA NA NA NA NA NA NA NA NA ...
## $ State : Factor w/ 8 levels "ACT","NSW","NT",..: 1 1 1 1 1 1 1 1 1 1 ...
## $ VegeType : int NA NA NA NA NA NA NA NA NA NA ...
## $ Season : int 1 1 1 1 1 1 1 1 1 1 ...
## $ Month : int 1 1 1 1 1 1 1 1 1 1 ...
## $ Diseases : int 0 0 0 0 0 0 0 0 0 0 ...
## $ Red.Fox : int 1 1 1 1 1 1 0 0 0 0 ...
## $ Dingo : int 1 1 1 1 1 1 0 0 0 0 ...
## $ Feral.Cat : int 1 1 1 1 1 1 1 1 1 1 ...
## $ Whistling.Kite : int 1 1 1 1 1 1 1 1 1 1 ...
## $ Wallaby.Sp : int 1 1 1 1 1 1 1 1 1 1 ...
head(Rabbit, n = 10)
tail(Rabbit, n = 10)
The factors have not been coded as factors, as such I will need to factorise them but everything else looks okay with the data frame.
#Factorise the factor
Rabbit$VegeType = factor(Rabbit$VegeType)
Rabbit$Season = factor(Rabbit$Season)
Rabbit$Month = factor(Rabbit$Month)
Rabbit$Diseases = factor(Rabbit$Diseases)
Rabbit$Red.Fox = factor(Rabbit$Red.Fox)
Rabbit$Dingo = factor(Rabbit$Dingo)
Rabbit$Feral.Cat = factor(Rabbit$Feral.Cat)
Rabbit$Whistling.Kite = factor(Rabbit$Whistling.Kite)
Rabbit$Wallaby.Sp = factor(Rabbit$Wallaby.Sp)
#Re-check the data set
str(Rabbit)
## 'data.frame': 689265 obs. of 37 variables:
## $ Occurrence_ID : int 683808 683809 684986 684987 684988 686921 686922 686923 686924 686925 ...
## $ Lat : num -37.1 -38.4 -36.8 -36.5 -36.6 ...
## $ Long : num 148 145 147 145 147 ...
## $ Occurences : int 33 24 5 2 7 159 7 57 57 111 ...
## $ Abund.1 : int 33 24 5 2 7 159 7 57 57 111 ...
## $ Abund.2 : num NA NA NA NA NA NA NA NA NA NA ...
## $ Abund.3 : num 5.14e-04 3.74e-04 7.78e-05 3.11e-05 1.09e-04 ...
## $ No.of.10km.cells : int 6425 6425 6425 6425 6425 6425 6425 6425 6425 6425 ...
## $ Year : int 1760 1760 1760 1760 1760 1760 1760 1760 1760 1760 ...
## $ Day : int NA NA NA NA NA NA NA NA NA NA ...
## $ A_Prec_Avg30Yr : num NA NA NA NA NA NA NA NA NA NA ...
## $ A_Psea_Avg30Yr : num NA NA NA NA NA NA NA NA NA NA ...
## $ A_TAvg_Avg30Yr : num NA NA NA NA NA NA NA NA NA NA ...
## $ A_TMax_Avg30Yr : num NA NA NA NA NA NA NA NA NA NA ...
## $ A_TMin_Avg30Yr : num NA NA NA NA NA NA NA NA NA NA ...
## $ A_TSea_Avg30Yr : num NA NA NA NA NA NA NA NA NA NA ...
## $ A_TWet_Avg30Yr : num NA NA NA NA NA NA NA NA NA NA ...
## $ A_TWrm_Avg30Yr : num NA NA NA NA NA NA NA NA NA NA ...
## $ A_Prec_AvgAutumn30Yr: num NA NA NA NA NA NA NA NA NA NA ...
## $ A_Prec_AvgSummer30Yr: num NA NA NA NA NA NA NA NA NA NA ...
## $ A_Prec_AvgSpring30Yr: num NA NA NA NA NA NA NA NA NA NA ...
## $ A_Prec_AvgWinter30Yr: num NA NA NA NA NA NA NA NA NA NA ...
## $ DistPermWater : num NA NA NA NA NA NA NA NA NA NA ...
## $ DistAgriLand : num NA NA NA NA NA NA NA NA NA NA ...
## $ PercSoilClay : num NA NA NA NA NA NA NA NA NA NA ...
## $ MinDayLength : num NA NA NA NA NA NA NA NA NA NA ...
## $ VarDayLength : num NA NA NA NA NA NA NA NA NA NA ...
## $ State : Factor w/ 8 levels "ACT","NSW","NT",..: 1 1 1 1 1 1 1 1 1 1 ...
## $ VegeType : Factor w/ 13 levels "1","2","3","4",..: NA NA NA NA NA NA NA NA NA NA ...
## $ Season : Factor w/ 4 levels "1","2","3","4": 1 1 1 1 1 1 1 1 1 1 ...
## $ Month : Factor w/ 20 levels "0","1","2","3",..: 2 2 2 2 2 2 2 2 2 2 ...
## $ Diseases : Factor w/ 4 levels "0","1","2","3": 1 1 1 1 1 1 1 1 1 1 ...
## $ Red.Fox : Factor w/ 2 levels "0","1": 2 2 2 2 2 2 1 1 1 1 ...
## $ Dingo : Factor w/ 2 levels "0","1": 2 2 2 2 2 2 1 1 1 1 ...
## $ Feral.Cat : Factor w/ 2 levels "0","1": 2 2 2 2 2 2 2 2 2 2 ...
## $ Whistling.Kite : Factor w/ 2 levels "0","1": 2 2 2 2 2 2 2 2 2 2 ...
## $ Wallaby.Sp : Factor w/ 2 levels "0","1": 2 2 2 2 2 2 2 2 2 2 ...
summary(Rabbit)
## Occurrence_ID Lat Long Occurences
## Min. : 1 Min. :-43.49 Min. :113.0 Min. : 1
## 1st Qu.:172317 1st Qu.:-34.15 1st Qu.:139.1 1st Qu.: 5779
## Median :344633 Median :-34.15 Median :139.2 Median : 45089
## Mean :344634 Mean :-34.02 Mean :140.1 Mean : 44618
## 3rd Qu.:516949 3rd Qu.:-33.25 3rd Qu.:139.4 3rd Qu.: 69866
## Max. :689285 Max. :-12.35 Max. :153.7 Max. :104045
##
## Abund.1 Abund.2 Abund.3 No.of.10km.cells
## Min. : 1 Min. : 0.0 Min. : 0.0 Min. : 1.0
## 1st Qu.: 8072 1st Qu.: 0.0 1st Qu.: 216.7 1st Qu.: 2.0
## Median : 45089 Median : 1.0 Median :2254.4 Median : 2.0
## Mean : 45553 Mean : 7.0 Mean :2202.2 Mean : 444.2
## 3rd Qu.: 69866 3rd Qu.: 6.0 3rd Qu.:3493.3 3rd Qu.: 2.0
## Max. :104045 Max. :546.7 Max. :5202.2 Max. :6425.0
## NA's :16774 NA's :638089
## Year Day A_Prec_Avg30Yr A_Psea_Avg30Yr
## Min. :1760 Min. : 1.00 Min. : 135.5 Min. : 9.73
## 1st Qu.:2007 1st Qu.: 9.00 1st Qu.: 262.6 1st Qu.: 23.40
## Median :2009 Median :13.00 Median : 336.8 Median : 27.34
## Mean :2007 Mean :16.06 Mean : 379.9 Mean : 29.87
## 3rd Qu.:2009 3rd Qu.:24.00 3rd Qu.: 423.0 3rd Qu.: 36.36
## Max. :2015 Max. :31.00 Max. :3270.0 Max. :137.75
## NA's :16774 NA's :70248 NA's :39295 NA's :39295
## A_TAvg_Avg30Yr A_TMax_Avg30Yr A_TMin_Avg30Yr A_TSea_Avg30Yr
## Min. : 4.83 Min. :15.15 Min. :-4.95 Min. :158.7
## 1st Qu.:14.78 1st Qu.:29.55 1st Qu.: 3.04 1st Qu.:472.2
## Median :15.66 Median :30.38 Median : 4.13 Median :478.1
## Mean :15.52 Mean :30.28 Mean : 3.83 Mean :491.1
## 3rd Qu.:16.14 3rd Qu.:31.36 3rd Qu.: 4.61 3rd Qu.:527.7
## Max. :28.21 Max. :41.84 Max. :18.46 Max. :676.2
## NA's :39295 NA's :39295 NA's :39295 NA's :39295
## A_TWet_Avg30Yr A_TWrm_Avg30Yr A_Prec_AvgAutumn30Yr A_Prec_AvgSummer30Yr
## Min. : 1.06 Min. : 9.80 Min. : 8.11 Min. : 5.31
## 1st Qu.: 9.82 1st Qu.:20.88 1st Qu.: 18.71 1st Qu.: 22.25
## Median :12.01 Median :21.69 Median : 22.84 Median : 23.52
## Mean :12.24 Mean :21.66 Mean : 26.84 Mean : 27.84
## 3rd Qu.:13.72 3rd Qu.:22.36 3rd Qu.: 29.75 3rd Qu.: 24.23
## Max. :32.50 Max. :33.11 Max. :447.45 Max. :474.84
## NA's :39295 NA's :39295 NA's :39295 NA's :39295
## A_Prec_AvgSpring30Yr A_Prec_AvgWinter30Yr DistPermWater DistAgriLand
## Min. : 0.85 Min. : 1.23 Min. : 0.000 Min. : 0.00
## 1st Qu.: 27.68 1st Qu.: 24.45 1st Qu.: 1.780 1st Qu.: 1.57
## Median : 32.78 Median : 35.97 Median : 3.360 Median : 6.76
## Mean : 37.35 Mean : 38.33 Mean : 3.897 Mean : 21.99
## 3rd Qu.: 42.73 3rd Qu.: 48.52 3rd Qu.: 4.010 3rd Qu.: 20.24
## Max. :251.57 Max. :286.80 Max. :170.370 Max. :915.02
## NA's :39295 NA's :39295 NA's :1073 NA's :1073
## PercSoilClay MinDayLength VarDayLength State
## Min. : 5.00 Min. : 8.950 Min. :0.250 QLD :543602
## 1st Qu.:20.90 1st Qu.: 9.870 1st Qu.:2.310 VIC :105780
## Median :28.10 Median : 9.870 Median :2.480 WA : 18385
## Mean :25.29 Mean : 9.878 Mean :2.478 ACT : 8644
## 3rd Qu.:29.99 3rd Qu.: 9.950 3rd Qu.:2.480 SA : 6394
## Max. :57.55 Max. :11.400 Max. :4.940 NT : 3498
## NA's :1078 NA's :1047 NA's :1047 (Other): 2962
## VegeType Season Month Diseases Red.Fox
## 11 :413419 1 :142960 4 :155686 0 : 278 0 : 121
## 10 :113955 2 :289616 3 :113069 1 : 26021 1 : 66993
## 3 :113598 3 :160038 7 :110078 2 :643287 NA's:622151
## 4 : 13507 4 : 69582 1 : 92315 3 : 2905
## 2 : 11864 NA's: 27069 2 : 39119 NA's: 16774
## (Other): 21795 (Other):153342
## NA's : 1127 NA's : 25656
## Dingo Feral.Cat Whistling.Kite Wallaby.Sp
## 0 : 121 0 : 121 0 : 121 0 : 121
## 1 : 13871 1 : 13816 1 :144911 1 : 73930
## NA's:675273 NA's:675328 NA's:544233 NA's:615214
##
##
##
##
In order to look at the drivers of rabbit occurrences at different scales I need to sub set the data to make new data frames based on these scales. First I will create a subset data frame for each Australian state/territory. I will then name each data frame “Rabbit_state/territory”
Rabbit_ACT = Rabbit[Rabbit$State == "ACT", ]
Rabbit_NSW = Rabbit[Rabbit$State == "NSW", ]
Rabbit_NT = Rabbit[Rabbit$State == "NT", ]
Rabbit_QLD = Rabbit[Rabbit$State == "QLD", ]
Rabbit_SA = Rabbit[Rabbit$State == "SA", ]
Rabbit_TAS = Rabbit[Rabbit$State == "TAS", ]
Rabbit_VIC = Rabbit[Rabbit$State == "VIC", ]
Rabbit_WA = Rabbit[Rabbit$State == "WA", ]
A point of note to myself, the Tasmania data set only has 365 observations whilst the other 7 data sets have thousands of observations. This may limit the number of parameters that can be potentially fitted to statistical models for the Tasmania given the number of potential predictors, whereas there should not be any such limitations for the other state/ territory specific data sets.
Creating the transect-scale data sets will be more tricky. To create the North-South transects I need to randomly sample longitudes given a fixed latitude and to create the East-West transects I need to randomly sample latitudes given a fixed longitude. To create transects of equal sizes the fixed longitudes and fixed latitudes can be within a range that corresponds to a set physical distance measured in metres.
First, some starting longitudes and latitudes to make the transects from.
#Most Northern and Southern Points
Long_max = max(Rabbit$Long, na.rm = TRUE)
Long_min = min(Rabbit$Long, na.rm = TRUE)
Long_max
## [1] 153.65
Long_min
## [1] 113.05
#Most Eastern and Western Points
Lat_min = min(Rabbit$Lat, na.rm = TRUE)
Lat_max = max(Rabbit$Lat, na.rm = TRUE)
Lat_min
## [1] -43.49
Lat_max
## [1] -12.35
As there is only a difference of approximately 40 units between Long_max and Long_min I will make 8 North-South transects with a difference in longitude of approximately 5 units. As for latitude there is only a difference of approximately 32 units between Lat_max and Lat_min, as such I will make 6 East-West transects with a difference in latitude of approximately 5 units.
Now we can make the transect-level data sets.
#North-South Transects
NS1 = Rabbit[sample(nrow(Rabbit[Rabbit$Long == 153.65, ]), 1000, replace = TRUE), ]
NS2 = Rabbit[sample(nrow(Rabbit[Rabbit$Long == 153.65 - 5.00, ]), 1000, replace = TRUE), ]
NS3 = Rabbit[sample(nrow(Rabbit[Rabbit$Long == 153.65 - 10.00, ]), 1000, replace = TRUE), ]
NS4 = Rabbit[sample(nrow(Rabbit[Rabbit$Long == 153.65 - 15.00, ]), 1000, replace = TRUE), ]
NS5 = Rabbit[sample(nrow(Rabbit[Rabbit$Long == 153.65 - 20.00, ]), 1000, replace = TRUE), ]
NS6 = Rabbit[sample(nrow(Rabbit[Rabbit$Long == 153.65 - 25.00, ]), 1000, replace = TRUE), ]
NS7 = Rabbit[sample(nrow(Rabbit[Rabbit$Long == 153.65 - 30.00, ]), 1000, replace = TRUE), ]
NS8 = Rabbit[sample(nrow(Rabbit[Rabbit$Long == 113.05, ]), 1000, replace = TRUE), ]
NS = rbind(NS1, NS2, NS3, NS4, NS5, NS6, NS7, NS8)
NS = as.data.frame(NS)
rm(NS1, NS2, NS3, NS4, NS5, NS6, NS7, NS8)
#East-West Transects
EW1 = Rabbit[sample(nrow(Rabbit[Rabbit$Lat == -12.35, ]), 1000, replace = TRUE), ]
EW2 = Rabbit[sample(nrow(Rabbit[Rabbit$Lat == -12.35, ]), 1000, replace = TRUE), ]
EW3 = Rabbit[sample(nrow(Rabbit[Rabbit$Lat == -12.35, ]), 1000, replace = TRUE), ]
EW4 = Rabbit[sample(nrow(Rabbit[Rabbit$Lat == -12.35, ]), 1000, replace = TRUE), ]
EW5 = Rabbit[sample(nrow(Rabbit[Rabbit$Lat == -12.35, ]), 1000, replace = TRUE), ]
EW6 = Rabbit[sample(nrow(Rabbit[Rabbit$Lat == -12.35, ]), 1000, replace = TRUE), ]
EW = rbind(EW1, EW2, EW3, EW4, EW5, EW6)
EW = as.data.frame(EW)
rm(EW1, EW2, EW3, EW4, EW5, EW6)
#Add Transect Variables
x = factor(rep(letters[1:8], each = 1000))
y = factor(rep(letters[1:6], each = 1000))
NS$Transect = x
EW$Transect = y
str(NS)
## 'data.frame': 8000 obs. of 38 variables:
## $ Occurrence_ID : int 684986 683808 684986 683809 683809 683808 684986 684986 684987 683809 ...
## $ Lat : num -36.8 -37.1 -36.8 -38.4 -38.4 ...
## $ Long : num 147 148 147 145 145 ...
## $ Occurences : int 5 33 5 24 24 33 5 5 2 24 ...
## $ Abund.1 : int 5 33 5 24 24 33 5 5 2 24 ...
## $ Abund.2 : num NA NA NA NA NA NA NA NA NA NA ...
## $ Abund.3 : num 7.78e-05 5.14e-04 7.78e-05 3.74e-04 3.74e-04 ...
## $ No.of.10km.cells : int 6425 6425 6425 6425 6425 6425 6425 6425 6425 6425 ...
## $ Year : int 1760 1760 1760 1760 1760 1760 1760 1760 1760 1760 ...
## $ Day : int NA NA NA NA NA NA NA NA NA NA ...
## $ A_Prec_Avg30Yr : num NA NA NA NA NA NA NA NA NA NA ...
## $ A_Psea_Avg30Yr : num NA NA NA NA NA NA NA NA NA NA ...
## $ A_TAvg_Avg30Yr : num NA NA NA NA NA NA NA NA NA NA ...
## $ A_TMax_Avg30Yr : num NA NA NA NA NA NA NA NA NA NA ...
## $ A_TMin_Avg30Yr : num NA NA NA NA NA NA NA NA NA NA ...
## $ A_TSea_Avg30Yr : num NA NA NA NA NA NA NA NA NA NA ...
## $ A_TWet_Avg30Yr : num NA NA NA NA NA NA NA NA NA NA ...
## $ A_TWrm_Avg30Yr : num NA NA NA NA NA NA NA NA NA NA ...
## $ A_Prec_AvgAutumn30Yr: num NA NA NA NA NA NA NA NA NA NA ...
## $ A_Prec_AvgSummer30Yr: num NA NA NA NA NA NA NA NA NA NA ...
## $ A_Prec_AvgSpring30Yr: num NA NA NA NA NA NA NA NA NA NA ...
## $ A_Prec_AvgWinter30Yr: num NA NA NA NA NA NA NA NA NA NA ...
## $ DistPermWater : num NA NA NA NA NA NA NA NA NA NA ...
## $ DistAgriLand : num NA NA NA NA NA NA NA NA NA NA ...
## $ PercSoilClay : num NA NA NA NA NA NA NA NA NA NA ...
## $ MinDayLength : num NA NA NA NA NA NA NA NA NA NA ...
## $ VarDayLength : num NA NA NA NA NA NA NA NA NA NA ...
## $ State : Factor w/ 8 levels "ACT","NSW","NT",..: 1 1 1 1 1 1 1 1 1 1 ...
## $ VegeType : Factor w/ 13 levels "1","2","3","4",..: NA NA NA NA NA NA NA NA NA NA ...
## $ Season : Factor w/ 4 levels "1","2","3","4": 1 1 1 1 1 1 1 1 1 1 ...
## $ Month : Factor w/ 20 levels "0","1","2","3",..: 2 2 2 2 2 2 2 2 2 2 ...
## $ Diseases : Factor w/ 4 levels "0","1","2","3": 1 1 1 1 1 1 1 1 1 1 ...
## $ Red.Fox : Factor w/ 2 levels "0","1": 2 2 2 2 2 2 2 2 2 2 ...
## $ Dingo : Factor w/ 2 levels "0","1": 2 2 2 2 2 2 2 2 2 2 ...
## $ Feral.Cat : Factor w/ 2 levels "0","1": 2 2 2 2 2 2 2 2 2 2 ...
## $ Whistling.Kite : Factor w/ 2 levels "0","1": 2 2 2 2 2 2 2 2 2 2 ...
## $ Wallaby.Sp : Factor w/ 2 levels "0","1": 2 2 2 2 2 2 2 2 2 2 ...
## $ Transect : Factor w/ 8 levels "a","b","c","d",..: 1 1 1 1 1 1 1 1 1 1 ...
str(EW)
## 'data.frame': 6000 obs. of 38 variables:
## $ Occurrence_ID : int 683808 683808 683809 683809 683809 683808 683809 683809 683809 683809 ...
## $ Lat : num -37.1 -37.1 -38.4 -38.4 -38.4 ...
## $ Long : num 148 148 145 145 145 ...
## $ Occurences : int 33 33 24 24 24 33 24 24 24 24 ...
## $ Abund.1 : int 33 33 24 24 24 33 24 24 24 24 ...
## $ Abund.2 : num NA NA NA NA NA NA NA NA NA NA ...
## $ Abund.3 : num 0.000514 0.000514 0.000374 0.000374 0.000374 ...
## $ No.of.10km.cells : int 6425 6425 6425 6425 6425 6425 6425 6425 6425 6425 ...
## $ Year : int 1760 1760 1760 1760 1760 1760 1760 1760 1760 1760 ...
## $ Day : int NA NA NA NA NA NA NA NA NA NA ...
## $ A_Prec_Avg30Yr : num NA NA NA NA NA NA NA NA NA NA ...
## $ A_Psea_Avg30Yr : num NA NA NA NA NA NA NA NA NA NA ...
## $ A_TAvg_Avg30Yr : num NA NA NA NA NA NA NA NA NA NA ...
## $ A_TMax_Avg30Yr : num NA NA NA NA NA NA NA NA NA NA ...
## $ A_TMin_Avg30Yr : num NA NA NA NA NA NA NA NA NA NA ...
## $ A_TSea_Avg30Yr : num NA NA NA NA NA NA NA NA NA NA ...
## $ A_TWet_Avg30Yr : num NA NA NA NA NA NA NA NA NA NA ...
## $ A_TWrm_Avg30Yr : num NA NA NA NA NA NA NA NA NA NA ...
## $ A_Prec_AvgAutumn30Yr: num NA NA NA NA NA NA NA NA NA NA ...
## $ A_Prec_AvgSummer30Yr: num NA NA NA NA NA NA NA NA NA NA ...
## $ A_Prec_AvgSpring30Yr: num NA NA NA NA NA NA NA NA NA NA ...
## $ A_Prec_AvgWinter30Yr: num NA NA NA NA NA NA NA NA NA NA ...
## $ DistPermWater : num NA NA NA NA NA NA NA NA NA NA ...
## $ DistAgriLand : num NA NA NA NA NA NA NA NA NA NA ...
## $ PercSoilClay : num NA NA NA NA NA NA NA NA NA NA ...
## $ MinDayLength : num NA NA NA NA NA NA NA NA NA NA ...
## $ VarDayLength : num NA NA NA NA NA NA NA NA NA NA ...
## $ State : Factor w/ 8 levels "ACT","NSW","NT",..: 1 1 1 1 1 1 1 1 1 1 ...
## $ VegeType : Factor w/ 13 levels "1","2","3","4",..: NA NA NA NA NA NA NA NA NA NA ...
## $ Season : Factor w/ 4 levels "1","2","3","4": 1 1 1 1 1 1 1 1 1 1 ...
## $ Month : Factor w/ 20 levels "0","1","2","3",..: 2 2 2 2 2 2 2 2 2 2 ...
## $ Diseases : Factor w/ 4 levels "0","1","2","3": 1 1 1 1 1 1 1 1 1 1 ...
## $ Red.Fox : Factor w/ 2 levels "0","1": 2 2 2 2 2 2 2 2 2 2 ...
## $ Dingo : Factor w/ 2 levels "0","1": 2 2 2 2 2 2 2 2 2 2 ...
## $ Feral.Cat : Factor w/ 2 levels "0","1": 2 2 2 2 2 2 2 2 2 2 ...
## $ Whistling.Kite : Factor w/ 2 levels "0","1": 2 2 2 2 2 2 2 2 2 2 ...
## $ Wallaby.Sp : Factor w/ 2 levels "0","1": 2 2 2 2 2 2 2 2 2 2 ...
## $ Transect : Factor w/ 6 levels "a","b","c","d",..: 1 1 1 1 1 1 1 1 1 1 ...
#Check the Data
summary(NS)
## Occurrence_ID Lat Long Occurences
## Min. : 127 Min. :-43.25 Min. :113.0 Min. : 1.0
## 1st Qu.:682545 1st Qu.:-37.95 1st Qu.:144.2 1st Qu.: 5.0
## Median :683809 Median :-37.15 Median :145.3 Median : 15.0
## Mean :666408 Mean :-36.19 Mean :143.9 Mean : 324.9
## 3rd Qu.:686922 3rd Qu.:-35.55 3rd Qu.:146.7 3rd Qu.: 33.0
## Max. :689285 Max. :-12.45 Max. :153.1 Max. :5121.0
##
## Abund.1 Abund.2 Abund.3 No.of.10km.cells
## Min. : 1.0 Min. : NA Min. : 0.00002 Min. : 1
## 1st Qu.: 5.0 1st Qu.: NA 1st Qu.: 0.00011 1st Qu.:6425
## Median : 15.0 Median : NA Median : 0.00037 Median :6425
## Mean : 324.9 Mean :NaN Mean : 14.73780 Mean :4973
## 3rd Qu.: 33.0 3rd Qu.: NA 3rd Qu.: 0.00173 3rd Qu.:6425
## Max. :5121.0 Max. : NA Max. :256.05000 Max. :6425
## NA's :8000
## Year Day A_Prec_Avg30Yr A_Psea_Avg30Yr
## Min. :1760 Min. : 1.00 Min. : 148.7 Min. :15.12
## 1st Qu.:1760 1st Qu.: 9.00 1st Qu.: 299.8 1st Qu.:21.19
## Median :1760 Median :15.00 Median : 299.8 Median :21.19
## Mean :1830 Mean :14.94 Mean : 401.4 Mean :25.70
## 3rd Qu.:1900 3rd Qu.:23.00 3rd Qu.: 309.1 3rd Qu.:21.19
## Max. :1971 Max. :31.00 Max. :1438.0 Max. :62.49
## NA's :7507 NA's :7701 NA's :7701
## A_TAvg_Avg30Yr A_TMax_Avg30Yr A_TMin_Avg30Yr A_TSea_Avg30Yr
## Min. : 8.56 Min. :21.01 Min. :-1.140 Min. :282.3
## 1st Qu.:15.86 1st Qu.:31.24 1st Qu.: 2.780 1st Qu.:554.6
## Median :15.86 Median :31.24 Median : 2.780 Median :563.7
## Mean :15.59 Mean :30.12 Mean : 3.416 Mean :516.9
## 3rd Qu.:15.86 3rd Qu.:31.24 3rd Qu.: 3.480 3rd Qu.:563.7
## Max. :21.97 Max. :38.37 Max. : 9.930 Max. :655.6
## NA's :7701 NA's :7701 NA's :7701 NA's :7701
## A_TWet_Avg30Yr A_TWrm_Avg30Yr A_Prec_AvgAutumn30Yr A_Prec_AvgSummer30Yr
## Min. : 3.52 Min. :14.34 Min. : 9.57 Min. :13.64
## 1st Qu.:12.47 1st Qu.:22.87 1st Qu.: 19.88 1st Qu.:25.99
## Median :12.47 Median :22.87 Median : 19.88 Median :25.99
## Mean :12.36 Mean :22.01 Mean : 28.07 Mean :30.13
## 3rd Qu.:12.47 3rd Qu.:22.87 3rd Qu.: 21.90 3rd Qu.:25.99
## Max. :29.87 Max. :29.87 Max. :105.40 Max. :88.57
## NA's :7701 NA's :7701 NA's :7701 NA's :7701
## A_Prec_AvgSpring30Yr A_Prec_AvgWinter30Yr DistPermWater DistAgriLand
## Min. : 11.69 Min. : 8.97 Min. : 0.410 Min. : 0.69
## 1st Qu.: 28.46 1st Qu.: 26.81 1st Qu.: 5.350 1st Qu.: 11.66
## Median : 28.46 Median : 26.81 Median : 5.350 Median : 35.73
## Mean : 37.11 Mean : 39.67 Mean : 5.222 Mean : 28.53
## 3rd Qu.: 28.70 3rd Qu.: 29.55 3rd Qu.: 5.350 3rd Qu.: 35.73
## Max. :141.27 Max. :150.06 Max. :60.240 Max. :363.88
## NA's :7701 NA's :7701 NA's :7701 NA's :7701
## PercSoilClay MinDayLength VarDayLength State VegeType
## Min. : 5.52 Min. : 9.470 Min. :1.350 ACT :8000 4 : 217
## 1st Qu.:27.58 1st Qu.:10.030 1st Qu.:2.140 NSW : 0 11 : 41
## Median :27.60 Median :10.030 Median :2.140 NT : 0 2 : 14
## Mean :24.54 Mean : 9.941 Mean :2.344 QLD : 0 7 : 13
## 3rd Qu.:27.60 3rd Qu.:10.030 3rd Qu.:2.140 SA : 0 3 : 12
## Max. :40.25 Max. :10.460 Max. :3.450 TAS : 0 (Other): 2
## NA's :7701 NA's :7701 NA's :7701 (Other): 0 NA's :7701
## Season Month Diseases Red.Fox Dingo Feral.Cat Whistling.Kite
## 1 :5465 1 :5099 0:7176 0:1868 0:1708 0:1327 0: 801
## 2 : 277 12 : 268 1: 824 1:6132 1:6292 1:6673 1:7199
## 3 : 405 9 : 215 2: 0
## 4 : 457 7 : 190 3: 0
## NA's:1396 11 : 186
## (Other): 714
## NA's :1328
## Wallaby.Sp Transect
## 0: 890 a :1000
## 1:7110 b :1000
## c :1000
## d :1000
## e :1000
## f :1000
## (Other):2000
summary(EW)
## Occurrence_ID Lat Long Occurences
## Min. :683808 Min. :-38.35 Min. :145.3 Min. :24.0
## 1st Qu.:683808 1st Qu.:-38.35 1st Qu.:145.3 1st Qu.:24.0
## Median :683808 Median :-37.15 Median :148.2 Median :33.0
## Mean :683809 Mean :-37.75 Mean :146.8 Mean :28.5
## 3rd Qu.:683809 3rd Qu.:-37.15 3rd Qu.:148.2 3rd Qu.:33.0
## Max. :683809 Max. :-37.15 Max. :148.2 Max. :33.0
##
## Abund.1 Abund.2 Abund.3 No.of.10km.cells
## Min. :24.0 Min. : NA Min. :0.0003735 Min. :6425
## 1st Qu.:24.0 1st Qu.: NA 1st Qu.:0.0003735 1st Qu.:6425
## Median :33.0 Median : NA Median :0.0005136 Median :6425
## Mean :28.5 Mean :NaN Mean :0.0004437 Mean :6425
## 3rd Qu.:33.0 3rd Qu.: NA 3rd Qu.:0.0005136 3rd Qu.:6425
## Max. :33.0 Max. : NA Max. :0.0005136 Max. :6425
## NA's :6000
## Year Day A_Prec_Avg30Yr A_Psea_Avg30Yr A_TAvg_Avg30Yr
## Min. :1760 Min. : NA Min. : NA Min. : NA Min. : NA
## 1st Qu.:1760 1st Qu.: NA 1st Qu.: NA 1st Qu.: NA 1st Qu.: NA
## Median :1760 Median : NA Median : NA Median : NA Median : NA
## Mean :1760 Mean :NaN Mean :NaN Mean :NaN Mean :NaN
## 3rd Qu.:1760 3rd Qu.: NA 3rd Qu.: NA 3rd Qu.: NA 3rd Qu.: NA
## Max. :1760 Max. : NA Max. : NA Max. : NA Max. : NA
## NA's :6000 NA's :6000 NA's :6000 NA's :6000
## A_TMax_Avg30Yr A_TMin_Avg30Yr A_TSea_Avg30Yr A_TWet_Avg30Yr A_TWrm_Avg30Yr
## Min. : NA Min. : NA Min. : NA Min. : NA Min. : NA
## 1st Qu.: NA 1st Qu.: NA 1st Qu.: NA 1st Qu.: NA 1st Qu.: NA
## Median : NA Median : NA Median : NA Median : NA Median : NA
## Mean :NaN Mean :NaN Mean :NaN Mean :NaN Mean :NaN
## 3rd Qu.: NA 3rd Qu.: NA 3rd Qu.: NA 3rd Qu.: NA 3rd Qu.: NA
## Max. : NA Max. : NA Max. : NA Max. : NA Max. : NA
## NA's :6000 NA's :6000 NA's :6000 NA's :6000 NA's :6000
## A_Prec_AvgAutumn30Yr A_Prec_AvgSummer30Yr A_Prec_AvgSpring30Yr
## Min. : NA Min. : NA Min. : NA
## 1st Qu.: NA 1st Qu.: NA 1st Qu.: NA
## Median : NA Median : NA Median : NA
## Mean :NaN Mean :NaN Mean :NaN
## 3rd Qu.: NA 3rd Qu.: NA 3rd Qu.: NA
## Max. : NA Max. : NA Max. : NA
## NA's :6000 NA's :6000 NA's :6000
## A_Prec_AvgWinter30Yr DistPermWater DistAgriLand PercSoilClay
## Min. : NA Min. : NA Min. : NA Min. : NA
## 1st Qu.: NA 1st Qu.: NA 1st Qu.: NA 1st Qu.: NA
## Median : NA Median : NA Median : NA Median : NA
## Mean :NaN Mean :NaN Mean :NaN Mean :NaN
## 3rd Qu.: NA 3rd Qu.: NA 3rd Qu.: NA 3rd Qu.: NA
## Max. : NA Max. : NA Max. : NA Max. : NA
## NA's :6000 NA's :6000 NA's :6000 NA's :6000
## MinDayLength VarDayLength State VegeType Season
## Min. : NA Min. : NA ACT :6000 1 : 0 1:6000
## 1st Qu.: NA 1st Qu.: NA NSW : 0 2 : 0 2: 0
## Median : NA Median : NA NT : 0 3 : 0 3: 0
## Mean :NaN Mean :NaN QLD : 0 4 : 0 4: 0
## 3rd Qu.: NA 3rd Qu.: NA SA : 0 5 : 0
## Max. : NA Max. : NA TAS : 0 (Other): 0
## NA's :6000 NA's :6000 (Other): 0 NA's :6000
## Month Diseases Red.Fox Dingo Feral.Cat Whistling.Kite Wallaby.Sp
## 1 :6000 0:6000 0: 0 0: 0 0: 0 0: 0 0: 0
## 0 : 0 1: 0 1:6000 1:6000 1:6000 1:6000 1:6000
## 2 : 0 2: 0
## 3 : 0 3: 0
## 4 : 0
## 5 : 0
## (Other): 0
## Transect
## a:1000
## b:1000
## c:1000
## d:1000
## e:1000
## f:1000
##
This concludes the “R Environment Set Up and Importing the Data section”. Next, we move on to “Initial Graphical Data Exploration and Research Questions”.
In this section I will be making scatter plots, box plots and co-plots of the predictor variables in the various data frames to get a sense of which predictors may vary with Occurrences. As I will be making a lot of plots of the same type I will create plotting functions to use in for loops that will speed up the creation of the plots.
scatterplot_fun = function(data, x, y, na.rm = TRUE){
ggplot(data = data, aes(x = .data[[x]], y = .data[[y]])) +
geom_point() +
geom_smooth(method = "loess", se = FALSE, colour = "red") +
theme_classic()
}
boxplot_fun = function(data, x, y, na.rm = TRUE){
ggplot(data = data, aes(x = .data[[x]], y = .data[[y]])) +
geom_boxplot() +
theme_classic()
}
boxplot_fun2 = function(data, x, y, z, na.rm = TRUE){
ggplot(data = data, aes(x = .data[[x]]*.data[[z]], y = .data[[y]])) +
geom_boxplot() +
theme_classic()
}
coplot_fun = function(data, x, y, z, na.rm = TRUE){
gg_coplot(data = data, x = .data[[x]], y = .data[[y]], faceting = .data[[z]], loess_family = "symmetric", size = 2) +
theme_classic()
}
I will start with the Australia scale data set Rabbit, where I will plot Occurences against Year and all the variables that come after this in numerical order.
for(i in Rabbit[, 9:26]){
print(ggplot(Rabbit, aes(x = i, y = Occurences)) +
geom_point()) + theme_classic()
Sys.sleep(1)
}
The data are very bunched together when plotted against occurrence, I will log-transform Occurences and re-plot the graphs.
for(i in Rabbit[, 9:27]){
print(ggplot(Rabbit, aes(x = i, y = log(Occurences))) +
geom_point() + theme_classic())
Sys.sleep(1)
}
The plots produced suggest that year, the precipitation variables and distanced to the edge of the nearest agricultural land may have country wide effects on rabbit occurrences with the other plots suggesting no likely relationships. Next, I will plot boxplots of log(Occurences) against the factor variables.
for(i in Rabbit[, 28:37]){
boxplot(log(Occurences) ~ i, data = Rabbit)
Sys.sleep(1)
}
The boxplots suggest there is likely a difference in the number of rabbit occurrences according to State, Vegetype, Season and Month. The other variables also likely have differences in rabbit occurences but I am sceptical of what they are showing. The boxplot for Diseases suggests that the number of rabbit occurrences is higher with the more introduced biological control diseases, however, what may be driving this is that the fewer the number of introduced diseases, the further back in time we are and the earlier in time we go, the sampling effort decreases. The presence/absence of potential predators and competitors seem to suggest that on the country scale, the number of rabbit occurences increases. This could be a result of predators actually predating competitors of the invasive European Rabbit and Wallaby species competing more intensely with other herbivores.
Now let’s look at some potential interactions between the variables. I will first use co-plots to look at interactions between the continuous and factor variables.
#for(i in Rabbit[, 9:27]){
# for(j in Rabbit[, 28:37]){
# coplot(Occurences ~ i | j, rows = 1, data = Rabbit)
# Sys.sleep(1)
# }
#}
This nested for loop takes a long time to run, whilst I search for a solution I will do the scatter and box plots for the other scales first and then come back to the co-plots.
I will move on to the State/Territory-scale level with the first State/Territory I will look at being the Australian Capital Territory (ACT). As the natural log scale was used for the Country-scale I will use it here and for all further graphing.
#Scatter Plots
for(i in Rabbit_ACT[, 9:27]){
print(ggplot(Rabbit_ACT, aes(x = i, y = log(Occurences))) +
geom_point() + theme_classic())
Sys.sleep(1)
}
## Warning: Removed 4887 rows containing missing values (geom_point).
## Warning: Removed 1043 rows containing missing values (geom_point).
## Warning: Removed 1043 rows containing missing values (geom_point).
## Warning: Removed 1043 rows containing missing values (geom_point).
## Warning: Removed 1043 rows containing missing values (geom_point).
## Warning: Removed 1043 rows containing missing values (geom_point).
## Warning: Removed 1043 rows containing missing values (geom_point).
## Warning: Removed 1043 rows containing missing values (geom_point).
## Warning: Removed 1043 rows containing missing values (geom_point).
## Warning: Removed 1043 rows containing missing values (geom_point).
## Warning: Removed 1043 rows containing missing values (geom_point).
## Warning: Removed 1043 rows containing missing values (geom_point).
## Warning: Removed 1043 rows containing missing values (geom_point).
## Warning: Removed 1043 rows containing missing values (geom_point).
## Warning: Removed 1043 rows containing missing values (geom_point).
## Warning: Removed 1043 rows containing missing values (geom_point).
## Warning: Removed 1043 rows containing missing values (geom_point).
## Warning: Removed 1043 rows containing missing values (geom_point).
#Boxplots
for(i in Rabbit_ACT[, 28:37]){
boxplot(log(Occurences) ~ i, data = Rabbit_ACT)
Sys.sleep(1)
}
The possible trends found in the Australia-Scale analysis was replicated for the ACT with the precipitation variables, year, distance to nearest argricultural land edge and distance to nearest permanent water feature may vary with rabbit occurences in ACT. The same possible trends in the factors at the Australia-scale were seen at the ACT scale.
Next, I will look at the New South Wales data set.
#Scatter Plots
for(i in Rabbit_NSW[, 9:27]){
print(ggplot(Rabbit_NSW, aes(x = i, y = log(Occurences))) +
geom_point() + theme_classic())
Sys.sleep(1)
}
## Warning: Removed 1825 rows containing missing values (geom_point).
#Boxplots
for(i in Rabbit_NSW[, 28:37]){
boxplot(log(Occurences) ~ i, data = Rabbit_NSW)
Sys.sleep(1)
}
The possible trends found in the Australia-Scale analysis was replicated for the NSW with the precipitation variables, year, distance to nearest argricultural land edge and distance to nearest permanent water feature may vary with rabbit occurences in NSW. The same possible trends in the factors at the Australia-scale were seen at the ACT scale except for diseases where no trend is likely and there are no whistling kite absences (a product of the method used to generate them).
Now I will move onto the Northern Territory data set.
#Scatter Plots
for(i in Rabbit_NT[, 9:27]){
print(ggplot(Rabbit_NT, aes(x = i, y = log(Occurences))) +
geom_point() + theme_classic())
Sys.sleep(1)
}
## Warning: Removed 2964 rows containing missing values (geom_point).
## Warning: Removed 1 rows containing missing values (geom_point).
#Boxplots
for(i in Rabbit_NT[, 28:37]){
boxplot(log(Occurences) ~ i, data = Rabbit_NT)
Sys.sleep(1)
}
In the Northern territory the percipitation, temperature, land use and day length variables may all vary with the number of rabbit occurences. The season effect seen on the Australia-scale may not exist in NT and there were again no wistling kite absences but all other factor trends seen in other data sets appear similar in NT.
Next, I repeated the analysis for Queensland (QLD).
#Scatter Plots
for(i in Rabbit_QLD[, 9:27]){
print(ggplot(Rabbit_QLD, aes(x = i, y = log(Occurences))) +
geom_point() + theme_classic())
Sys.sleep(1)
}
## Warning: Removed 31287 rows containing missing values (geom_point).
## Warning: Removed 24 rows containing missing values (geom_point).
## Warning: Removed 24 rows containing missing values (geom_point).
## Warning: Removed 26 rows containing missing values (geom_point).
## Warning: Removed 4 rows containing missing values (geom_point).
## Warning: Removed 4 rows containing missing values (geom_point).
#Boxplots
for(i in Rabbit_QLD[, 28:33]){
boxplot(log(Occurences) ~ i, data = Rabbit_QLD)
Sys.sleep(1)
}
The plots for Queensland suggest that the trends suggested in the Australia-wide data except for no trend in year, likely due to the relatively greater sampling effert in QLD compared to the other states, and it was not possible to make box plots for the animal species except for red foxes and dingoes.
Now I will move on to South Australia. There are some issues with this data to note, only VegeType can be plotted as a box plot as Diseases, Season and Month only have one level and there is no data on any of the animal species.
#Scatter Plots
for(i in Rabbit_SA[, 9:27]){
print(ggplot(Rabbit_SA, aes(x = i, y = log(Occurences))) +
geom_point() + theme_classic())
Sys.sleep(1)
}
#Boxplots
boxplot(log(Occurences) ~ VegeType, data = Rabbit_SA)
There are plotting issues that I need to resolve latter but are due in part to there being no data for some variables and quality issues with what data is there as the occurences are all the same value.
Next, I move on to the Tasmania data set. In the TAS data set there is no data on the animal species, no variability in the number of occurences and only one factor level for all factors with data except for VegeType
For the Victoria data set there is again no data for the animal species and Diseases has only one factor level but there is variability in occurences and so the data is plottable.
#Scatter Plots
for(i in Rabbit_VIC[, 9:27]){
print(ggplot(Rabbit_VIC, aes(x = i, y = log(Occurences))) +
geom_point() + theme_classic())
Sys.sleep(1)
}
## Warning: Removed 12515 rows containing missing values (geom_point).
## Warning: Removed 19867 rows containing missing values (geom_point).
## Warning: Removed 19867 rows containing missing values (geom_point).
## Warning: Removed 19867 rows containing missing values (geom_point).
## Warning: Removed 19867 rows containing missing values (geom_point).
## Warning: Removed 19867 rows containing missing values (geom_point).
## Warning: Removed 19867 rows containing missing values (geom_point).
## Warning: Removed 19867 rows containing missing values (geom_point).
## Warning: Removed 19867 rows containing missing values (geom_point).
## Warning: Removed 19867 rows containing missing values (geom_point).
## Warning: Removed 19867 rows containing missing values (geom_point).
## Warning: Removed 19867 rows containing missing values (geom_point).
## Warning: Removed 19867 rows containing missing values (geom_point).
## Warning: Removed 6 rows containing missing values (geom_point).
## Warning: Removed 6 rows containing missing values (geom_point).
## Warning: Removed 8 rows containing missing values (geom_point).
#Boxplots
for(i in Rabbit_VIC[, 28:31]){
boxplot(log(Occurences) ~ i, data = Rabbit_VIC)
Sys.sleep(1)
}
For the variables that could be plotted there are trends in all variables as was the case in NT.
For the Wesern Australia data set there is again no data for the animal species, precipitation variables, temperature variables and Diseases has only one factor level but there is variability in occurences and so the data is plottable.
#Scatter Plots
for(i in Rabbit_WA[, 9:27]){
print(ggplot(Rabbit_WA, aes(x = i, y = log(Occurences))) +
geom_point() + theme_classic())
Sys.sleep(1)
}
## Warning: Removed 16774 rows containing missing values (geom_point).
## Warning: Removed 16770 rows containing missing values (geom_point).
## Warning: Removed 18385 rows containing missing values (geom_point).
## Warning: Removed 18385 rows containing missing values (geom_point).
## Warning: Removed 18385 rows containing missing values (geom_point).
## Warning: Removed 18385 rows containing missing values (geom_point).
## Warning: Removed 18385 rows containing missing values (geom_point).
## Warning: Removed 18385 rows containing missing values (geom_point).
## Warning: Removed 18385 rows containing missing values (geom_point).
## Warning: Removed 18385 rows containing missing values (geom_point).
## Warning: Removed 18385 rows containing missing values (geom_point).
## Warning: Removed 18385 rows containing missing values (geom_point).
## Warning: Removed 18385 rows containing missing values (geom_point).
## Warning: Removed 18385 rows containing missing values (geom_point).
#Boxplots
for(i in Rabbit_WA[, 28:31]){
boxplot(log(Occurences) ~ i, data = Rabbit_WA)
Sys.sleep(1)
}
There were trends with occurence for all the variables that had data, but the trends were quite different than the trends seen in other states/territories.
Now I will finally move on to the transect-scale data sets where the same exploratory data analysis of scatter, box and coplots will be done for each data set. I will start with the North-South transect data.
#Scatter Plots
for(i in NS[, 9:27]){
print(ggplot(NS, aes(x = i, y = log(Occurences))) +
geom_point() + theme_classic())
Sys.sleep(1)
}
## Warning: Removed 7507 rows containing missing values (geom_point).
## Warning: Removed 7701 rows containing missing values (geom_point).
## Warning: Removed 7701 rows containing missing values (geom_point).
## Warning: Removed 7701 rows containing missing values (geom_point).
## Warning: Removed 7701 rows containing missing values (geom_point).
## Warning: Removed 7701 rows containing missing values (geom_point).
## Warning: Removed 7701 rows containing missing values (geom_point).
## Warning: Removed 7701 rows containing missing values (geom_point).
## Warning: Removed 7701 rows containing missing values (geom_point).
## Warning: Removed 7701 rows containing missing values (geom_point).
## Warning: Removed 7701 rows containing missing values (geom_point).
## Warning: Removed 7701 rows containing missing values (geom_point).
## Warning: Removed 7701 rows containing missing values (geom_point).
## Warning: Removed 7701 rows containing missing values (geom_point).
## Warning: Removed 7701 rows containing missing values (geom_point).
## Warning: Removed 7701 rows containing missing values (geom_point).
## Warning: Removed 7701 rows containing missing values (geom_point).
## Warning: Removed 7701 rows containing missing values (geom_point).
#Boxplots
for(i in NS[, 28:38]){
boxplot(log(Occurences) ~ i, data = NS)
Sys.sleep(1)
}
In the North-South data there are possible trends in all of the continuous variables except year, day, distance to the edge of the nearest agricultural land and percentage clay in the soil. Something to note is that these possible trends appear more non-linear at this scale than the Country and State/Territory scales. There were no likely trends between rabbit occurences and the factor variables except for vegetation type (not all 13 types were present), season, month and red foxes. There was a possible effect of transect, something to note when it comes to modelling the data.
The EW data set has no data on continuous variables apart from abundance and the factor levels only have 1 level or no levels at all making this data set unusable.